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Abstract 

We  present  an  efficient  encoding  of  core  Java  constructs  in 
a  simple,  implementable  typed  intermediate  language.  The 
encoding,  after  type  erasure,  has  the  same  operational  be¬ 
havior  as  a  standard  implementation  using  vtables  and  self¬ 
application  for  method  invocation.  Classes  inherit  super-class 
methods  with  no  overhead.  We  support  mutually  recursive 
classes  while  preserving  separate  compilation.  Our  strategy 
extends  naturally  to  a  significant  subset  of  Java,  including 
interfaces  and  privacy.  The  formal  translation  using  Feath¬ 
erweight  Java  allows  comprehensible  type-preservation  proofs 
and  serves  as  a  starting  point  for  extending  the  translation  to 
new  features. 

1  Introduction 

Many  compilation  techniques  for  functional  languages  focus 
on  type-directed  compilation  [22,  25,  30].  Source-level  types 
are  transformed  along  with  the  program  and  then  used  to 
guide  and  justify  advanced  optimizations.  More  generally, 
types  preserved  throughout  compilation  can  be  used  to  reason 
about  the  safety  and  security  of  object  code  [21,  23,  24]. 
Recently,  several  researchers  have  attempted  to  bring  these 
benefits  to  object-oriented  languages  [7,  12,  18,  32].  Last 
year’s  FOOL  workshop  even  featured  a  panel  discussion  on 
typed  intermediate  languages. 

These  intermediate  languages  are  typically  based  on  typed 
A-calculi.  There  is  significant  precedent  for  encoding  object- 
oriented  languages  in  typed  A-calculi  [2,  4,  5,  6,  9],  but  this 
domain — type-preserving  compilation — imposes  several  new 
requirements  and  allows  us  to  reject  a  few  traditional  assump¬ 
tions.  The  intermediate  language  must  provide  extremely 
simple  primitives  (that  correspond,  e.g.,  to  at  most  several 
machine  instructions),  so  that  our  encodings  are  amenable  to 
optimization.  We  must  avoid  introducing  any  dynamic  over¬ 
head  solely  to  achieve  static  typing.  In  addition,  the  type 
system  should  be  as  simple  as  possible,  so  that  type  checking 
is  efficient  in  practice.  Subsumption  is  not  required — it  can 
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be  replaced  with  explicit  coercions,  as  long  as  their  runtime 
cost  is  nil.  In  an  intermediate  language  we  are  not  concerned 
with  syntactic  niceties  or  resemblance  to  source-level  con¬ 
structs.  Finally,  a  type-preserving  compiler  should  preserve 
source-level  abstractions.  Link-time  type  checking  will  not 
prevent,  e.g.,  one  class  from  accessing  the  private  fields  of 
another — unless  the  abstractions  are  preserved  in  the  object 
code. 

The  main  contribution  of  this  paper  is  an  efficient  encod¬ 
ing  of  key  Java™  [13]  constructs  in  a  simple,  implementable 
typed  intermediate  language.  After  type  erasure,  our  code  has 
the  same  operational  behavior  as  a  standard  implementation 
using  self-application  for  method  invocation.  Our  strategy 
extends  naturally  to  a  significant  subset  of  Java  and  an  imple¬ 
mentation  is  in  progress. 

This  paper  extends  and  improves  our  previous  work  [18] 
in  four  significant  ways.  First,  it  supports  mutually  recursive 
classes.  Java  allows  classes  to  depend  on  one  another’s  types 
and  components  in  ways  that  test  the  limitations  of  the  SML 
module  system.  Our  solution  maintains  separate  compilation 
of  classes.  Second,  we  give  a  complete  implementation  of 
dynamic  casts — another  challenge  for  type  theory — without 
using  an  imperative  tag  generator.  Again,  our  solution  is  com¬ 
patible  with  separate  compilation.  Third,  the  small  source 
calculus  we  use  allows  comprehensible  proofs  of  interesting 
formal  properties  of  the  translation,  such  as  type  preserva¬ 
tion.  Finally,  the  core  translation  presented  here  is  an  ef¬ 
fective  starting  point  for  designing  encodings  of  and  proving 
properties  about  interesting  source  language  extensions,  such 
as  privacy,  genericity,  and  reflection. 

We  describe  the  syntax  and  semantics  of  our  source  and 
target  languages  in  the  next  two  sections.  In  section  4,  we 
explain  and  formalize  each  aspect  of  our  translation  and  prove 
that  it  preserves  types.  Section  5  discusses  several  extensions, 
focusing  on  a  tricky  but  tractable  interaction  between  mutual 
recursion  and  privacy.  We  contrast  our  technique  with  recent 
related  work  in  section  6. 

2  Source  language 

The  source  language  for  our  translation  is  Featherweight  Java 
(FJ),  a  “minimal  core  calculus  for  modeling  Java’s  type  sys¬ 
tem”  [16].  The  syntax  is  given  in  figure  1;  for  reference,  we 
reprint  the  semantics  in  appendix  A. 

Class  declarations  (CL)  contain  the  names  of  the  new  class 
and  its  super  class,  a  sequence  of  field  declarations,  a  con- 
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Figure  1:  Syntax  of  Featherweight  Java:  classes,  constructors, 
methods,  and  expressions. 


structor  (K),  and  a  sequence  of  method  declarations  (M).  We 
use  letters  A  through  E  to  range  over  class  names,  f  and  g 
to  range  over  field  names,  m  over  method  names,  and  x  over 
other  variables.  There  are  five  forms  of  expressions:  variables, 
field  selection,  method  invocation,  object  creation,  and  cast. 
A  program  (CT,  e)  consists  of  a  fixed  class  table,  CT,  mapping 
class  names  to  declarations,  and  a  main  program  expression  e. 

There  are  no  assignments,  interfaces,  super  calls,  excep¬ 
tions,  or  access  control.  Constructors  always  take  all  the  fields 
as  arguments,  in  the  correct  order.  FJ  permits  recursive  class 
dependencies  with  the  full  generality  of  Java.  A  class  can  refer 
to  types  and  call  constructors  of  any  other  class,  including  its 
sub-classes.  While  this  does  not  complicate  the  FJ  semantics, 
it  is  one  of  the  major  challenges  of  our  translation. 

The  subtype  relation  <:  is  the  reflexive,  transitive  closure 
of  the  super  class  declarations  (class  C  <i  B  ).  The  relation 
fields(C)  returns  the  sequence  of  all  the  fields  found  in  objects 
of  class  C.  The  relation  mtype(m,  C)  finds  the  type  signature 
for  method  m  in  class  C  by  searching  up  the  hierarchy.  Type 
signatures  have  the  form  Di . . .  D„->Do. 

The  expression  typing  rules  govern  judgments  of  the  form 
r  h  e  G  C,  meaning  that  FJ  expression  e  is  of  type  C  in  con¬ 
text  r.  The  operational  semantics  are  given  by  three  primi¬ 
tive  reduction  rules  and  the  expected  congruence  rules.  Since 
there  are  no  side  effects,  evaluation  order  is  unspecified.  The 
FJ  type  system  is  sound  and  decidable.  Please  see  the  ap¬ 
pendix  for  the  rules,  or  [16]  for  further  explanation. 

3  Target  language 

The  target  language  of  our  translation  is  the  higher-order 
polymorphic  A-calculus  F,,,  [11,  29]  extended  with  type  tu¬ 
ples,  existential  types  [20],  row  polymorphism  [27],  ordered 
records,  sum  types,  iso-recursive  types,  and  a  term-level  fix- 
point  for  constructing  recursive  records.  The  syntax  appears 
in  figure  2;  typing  rules  for  the  non-standard  features  are 
given  in  figure  3. 

Labeled  tuples  of  types  are  enclosed  in  braces  {/  =  t...} 
and  have  tuple  kinds  Their  components  are  se¬ 

lected  using  a  mid-dot:  t-1.  The  existential  types  are  stan¬ 
dard:  introduced  by  the  package  construct  {a::K  =  T,  e-.r') 
and  eliminated  (within  some  restricted  scope)  by  open;  see 
rules  (1)  and  (2). 

Following  Remy  [27]  we  introduce  a  kind  of  rows  R^, 
where  L  is  the  set  of  labels  banned  from  the  row.  Abs^  is 
an  empty  row  of  kind  R^,  and  1:t;t'  prepends  field  I  of 
type  T  onto  the  row  t'.  The  row  formation  rules  (3)  and  (4) 
prohibit  duplicate  labels:  VQ:::Rf^f.T  cannot  be  instantiated 
with  a  row  in  which  x  is  already  bound.  Boldface  braces 


Kinds 

K  ::= 

=  Type  1  Ri  |  k^k'  \  {(!::«)*} 

Types 

r 

:  a  1  Xa::K.  t  \  t  t'  \  {(I  =  t)*} 

1  T-l 

t-^t'  I  Abs^  1  1:t-,t'  I  {t}  | 

It] 

pa'.'.K.  T  1  Vq:::k.  t  |  3a::«:.  t 

Terms 

e  ::= 

:  X  1  Ax:t.  e  1  e  e'  \  {(l  =  e)*}  | 

e.l  1  fix  [t]  e 

inJJ"  [  case  e  of  {1  x  =>  e)*  else 

!  e 

fold  e  as  T  at  /  unfold  e  as  t 

at  1 

Aa::K.  e  |  e  [t]  |  {a.:\K  =  T,  e: 

r') 

open  e  as  {av.K,  x'.t)  me' 

Derived  forms: 


/i :  Ti,  r„  =  /i :  Ti  T„  ;  Abs^'‘- 

1  =  {Abs®} 

maybe  =  Aa::Type.  [some :  a,  none :  1] 
some  =  Aa::Type.  “ 

none  =  Aa::Type.  inj;^'’'"  {} 

let  x:  T  =  e  in  e'  =  (Ax:  t.  e')  e 

Figure  2:  Syntax  of  the  target  language. 


{ • }  denote  the  record  constructor,  which  lifts  a  complete 
row  type  (of  kind  R®)  to  kind  Type.  Permutations  of  rows 
are  not  considered  equivalent,  so  record  selection  e.l  can  be 
compiled  using  fixed  offsets.  We  sometimes  use  commas  and 
omit  Abs^  when  specifying  complete  rows  (see  the  derived 
forms  in  figure  2).  We  let  1  (read  ‘unit’)  denote  the  empty 
record  type. 

Labeled  sum  types  are  constructed  by  enclosing  a  com¬ 
plete  row  within  boldface  brackets:  [  •  ].  Sum  types  are  intro¬ 
duced  by  a  term-level  injection  and  eliminated  by  an  ML-like 
case  statement;  see  rules  (8)  and  (9).  Figure  2  defines  a  pa¬ 
rameterized  type  maybe  with  constructors  some  and  none. 

We  use  iso-recursive  types  at  higher  kinds.  The  rules  for 
folding  and  unfolding  them  are  unconventional,  and  deserve 
further  explanation.  Suppose  we  wish  to  encode  the  following 
mutually  recursive  type  abbreviations: 

type  even  =  maybe  {hd  :  int,  tl :  odd} 
type  odd  =  {hd  :  int,  tl :  even} 

The  solution  is  expressed  as  the  fixpoint  over  a  tuple: 

f  =  pa::{e\/en  ::  Type,  odd  ::  Type}. 

{even  =  maybe  {hd  :  int,  tl :  a-odd}, 
odd  =  {hd  :  int,  tl :  a-even}} 

Now,  the  two  recursive  types  are  expressed  as  f-even  and 
f-odd.  There  are,  however,  no  type  equivalence  rules  for  re¬ 
ducing  f-even;  a  term  having  this  type  must  first  be  unfolded. 
We  allow  unfolding  of  recursive  types  within  a  tuple  by  speci¬ 
fying  a  label  after  the  at  keyword.  If  e  has  type  f-odd,  then  the 
expression  unfold  e  as  f  at  odd  has  type  {hd  :  int,  tl :  f-even}. 
For  recursive  types  of  kind  Type,  we  simply  omit  the  at 
clause.  To  conserve  space,  we  sometimes  omit  type  annota¬ 
tions  that  can  be  readily  inferred,  writing,  e.g.,  unfold  e  for 
unfold  e  as  T  when  e  has  type  t. 
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Pack  and  open  for  existential  types: 


Sum  type,  its  introduction  and  elimination: 


a  ::  K  h  r  ::  Type  h  t'  ::  re 

A  h  e'.T[a  :=  t'] 

A  h  {av.K  =  t',  e :  r)  :  3a::re.  t 


$  h  T  ::  R® 

$  h  [t]  ::  Type 


$;  A  h  e :  3a::re.  t  $  h  t'  ::  Type 
a  ::  re;  A,  X :  T  h  e' :  r' 

A  h  open  e  as  {av.K,  X'.t)  in  e' :  t' 

Row  and  record  types: 

h  <f>  kind  ewv 
$  h  Abs^  :: 

$  h  T  ::  Type  <f>  h  t'  :: 

$  h  ::  R^-W 

$  h  T  ::  R® 

$  h  {t}  ::  Type 

Recursive  record  term: 

A  h  e :  {t}— >{t} 

$;  A  h  fix  [t]  e :  {t} 


(2) 

(3) 

(4) 

(5) 

(6) 


_ ^  I-  [/i:ti;  t„;t]  ::  Type _ 

A  h  :t„  ;t] 

I'j  =  If  ^j=  f  (V;,/  e 

$;  A  h  e :  [Zi :  Ti  T„  ;  t]  $;  A  h  e' :  r' 

3Ze  {!...«}  :Z,  =  Z' 

and  A,  Xj :  r,-  h  e,- :  r'  (Vj  G  { 1 . . .  m}) 
$;  A  h  case  e  of  (Ij  Xj  =>  else  e' :  t' 

Fold  and  unfold  for  recursive  types: 

$,  a  ::  re  h  T  ::  re  re  =  {Zi ::  rei . . .  Z„  ::  re„} 

$;A  h  e:T-li[a  :=  fLav.K.r] 

A  h  fold  e  as  fiav.K.  r  at  /,■ :  {fLav.K.  T)-li 


(8) 


(9) 


(10) 


a  ::  re  h  T  ::  re  k  =  {l\V.  ki  . .  .l„v.  re„} 
$;  A  h  e:  {jxav.K.  T)-li 

$;  A  h  unfold  e  as  fiav.K.  t  at  Z, 

:  T-lila  :=  fiav.K.  t] 


(11) 


Figure  3:  Selected  typing  rules  for  the  target  language.  The  judgments  represented  are  type  formation  <h  h  t  ::  re  and  term 
formation  A  h  e  :  t,  where  $  maps  type  variables  to  their  kinds  and  A  maps  term  variables  to  their  types. 


In  addition  to  the  rules  in  figure  3,  the  static  semantics 
includes  formation  rules  for  all  other  syntactic  forms  and 
judgments  for  environment  formation  and  type  equivalence. 
All  static  judgments  are  decidable.  The  type  system  is  sound 
with  respect  to  a  structured  operational  semantics.  The  target 
language  also  enjoys  a  type  erasure  property:  type  manipula¬ 
tions  (e.g.,  type  abstractions,  folds,  pack/open)  can  be  erased 
before  runtime  without  affecting  the  result.  Complete  details 
will  be  available  in  a  companion  technical  report.  The  imple¬ 
mentation  of  the  target  language  should  be  quite  practical;  it 
is  but  a  minor  extension  of  FLINT,  the  intermediate  language 
already  in  wide  use  in  the  SML/NJ  compiler  [31]. 

4  Translation 

Each  FJ  class  is  separately  compiled  into  a  closed  term 
which  imports  the  types,  method  tables,  and  constructors  of 
other  classes  and  produces  its  own  method  table  and  con¬ 
structor.  The  compilation  units  are  then  instantiated  and 
linked  together  with  a  term-level  fixpoint  constructor. 

We  begin  this  section  by  describing  and  formalizing  our 
basic  object  encoding.  In  section  4.2,  we  give  a  type-directed 
translation  of  FJ  expressions.  Inheritance,  overriding,  and 
constructors  are  examined  as  part  of  the  class  encoding  in 
section  4.3.  Finally,  section  4.4  covers  linking.  Many  aspects 
of  the  translation  are  mutually  dependent,  but  we  believe  this 
ordering  yields  a  reasonably  coherent  explanation. 


4.1  Object  encoding 

The  standard  explanation  of  method  invocation  in  terms 
of  records  and  fields  is  called  self  application  [17].  In  a 
class-based  language,  the  object  record  contains  values  for 
all  the  fields  plus  a  pointer  to  a  record  of  methods,  called 
the  viable.  The  viable  is  created  once  and  shared  among 
all  objects  of  the  same  class.  The  methods  in  the  viable 
expect  the  object  itself  as  an  argument.  Suppose  class 
Point  has  one  integer  field  x  and  one  method  getx  to  re¬ 
trieve  it.  Ignoring  types  for  the  moment,  the  term  po  = 
{vtab  =  {getx  =  Aself.  (seif.x)},  x  =  42}  could  be  an  instance 
of  class  Point.  The  self-application  term  po-vtab.getx  po  in¬ 
vokes  the  method. 

What  type  can  we  assign  to  the  self  argument?  The 
typing  derivation  for  the  self  application  term  forces  it  to 
match  the  type  of  the  object  record  itself  That  is,  well- 
typed  self  application  requires  that  po  have  type  t  where  t 
=  {vtab:{getx:T— >int},  x:int}.  Because  t  appears  in  its 
own  definition,  the  solution  must  involve  a  fixpoint.  The  re¬ 
cursive  types  in  our  target  language  will  suffice  if  augmenting 
the  code  with  fold  and  unfold  annotations  allows  for  a  proper 
typing  derivation.  Let  the  type  of  self  be 

Tp,  =  //self,  {vtab  :  {getx :  self — >int},  x:int} 

Happily,  the  folded  object  term 

pi  =  fold  {vtab  =  {getx  =  Aself :  (unfold  selfj.x}, 

x  =  42} 
as  Tpt 
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is  well-typed,  as  is  the  augmented  self-application  term: 
(unfold  pi).vtab.getxpi. 

Suppose  class  ColorPoint  extends  Point  with  an  additional 
field  and  method:  The  type  of  an  object  of  class  ColorPoint 
would  be: 

Tcp  =  /tself.  {vtab  :  {getx :  self — >int,  getc :  self — >color}, 

X :  int,  c :  color} 

How  do  we  relate  these  two  types?  That  is,  how  does  a 
function  expecting  a  Point  accept  a  ColorPoint?  Traditional 
models  employ  subsumption — in  extended  with  recursive 

types  and  a  ‘top’  sub  typing  rule,  Tcp  <  Tpt.  We  favor  explicit 
(but  erasable)  type  manipulations  over  subsumption.  While 
it  may  be  possible  to  implement  the  necessary  subtyping  re¬ 
lationships  in  a  calculus  of  coercions  [8],  we  have  meanwhile 
developed  an  effective,  efficient  encoding  using  more  stan¬ 
dard,  conservative  extensions  to  F^j. 

Java  programmers  distinguish  the  static  and  dynamic 
classes  of  an  object — declared  types  indicate  static  classes; 
constructors  provide  dynamic  classes.  Static  classes  of  a  given 
object  differ  at  different  program  points;  dynamic  classes  are 
unchanging.  Static  classes  are  known  at  compile-time;  dy¬ 
namic  classes  are  revealed  at  run-time  only  by  reflection  and 
dynamic  casts. 

We  implement  this  distinction  via  a  pair  of  existentially- 
quantified  rows.  Some  prefix  of  the  object  record  is  known; 
the  rest  is  hidden,  abstract.  Consider  this  static  type  of  a 
Point  object: 

=  3tail::{f::Rt''‘"'=-’‘>,  m  ::  Type^Rts"‘’‘>}. 

^self.  {vtab  :  {getx :  self — >int ;  tail-m  self}  ; 

X :  int ; 
tail-f} 

The  f  component  of  the  tuple  tail  denotes  a  hidden  row  miss¬ 
ing  the  labels  vtab  and  x.  Subclasses  of  Point  append  new 
fields  by  packaging  non-trivial  rows  into  the  witness  type. 
Similarly,  tail  contains  a  component  m  for  appending  new 
methods  onto  the  vtable.  This  component  is  a  type  operator 
expecting  the  recursive  self  type,  so  that  it  can  be  propagated 
to  method  types  in  the  dynamic  class.  The  Point  object  pi 
can  be  packaged  into  a  term  of  using  the  trivial  witness 
type  {f  =  m  =  As::Type.  Absl^®''*’')^}.  A  ColorPoint 

object  would  include  a  non-trivial  witness  type  to  append  the 
new  field  and  method: 

{f=(c:  color  ;Abs<;''“'"-’'-^>), 
m  =  As::Type.  (getc :  s — > color ;  )} 

Now,  objects  of  different  dynamic  classes  can  be  repackaged 
into  the  type  of  a  common  super  class. 

This  is,  in  essence,  the  object  encoding  we  use  to  compile 
Java.  Before  embarking  on  the  formal  translation,  we  must 
explore  one  more  aspect:  recursive  references.  Suppose  the 
Point  class  has  also  a  method  bump  which  returns  a  new 
Point.  The  type  of  objects  of  class  Point  must  then  refer  to 
the  type  of  objects  of  class  Point.  This  recursive  reference 
calls  for  another  fixpoint,  outside  the  existential: 

^twin.  Stall,  /iself.  {vtab :  {getx :  self — >int ; 

bump: self — > twin; tail-m  self}; 
x:int;  tail-f} 


Using  self  as  the  return  type  would  overly  constrain  imple¬ 
mentations  of  bump,  forcing  them  to  return  objects  of  the 
same  dynamic  class  as  the  receiver.  In  Java,  type  signatures 
constrain  static  classes  only.  Because  twin  is  outside  the  exis¬ 
tential,  its  witness  type  is  distinct  from  that  of  self. 

We  used  this  technique  in  [18]  to  explain  self-references, 
but  Java  supports  mutually  recursive  references  as  well.  Sup¬ 
pose  class  A  defines  a  method  returning  an  object  of  class  B, 
and  vice-versa;  ignoring  fields  entirely  for  a  moment,  define 
the  type 

AB  =  pw::{A  ::  Type,  B  ::  Type}. 

{A  =  3tail::Type^R{s®*‘’}. 

/<self::Type.  {getb :  self — >w-B  ;  tail  self}, 

B  =  3tail::Type^R{s®*®}. 

/tself::Type.  {geta  :  self — >w-A ;  tail  self}} 

Using  the  contextual  fold/unfold  described  earlier,  objects  of 
class  A  can  be  folded  into  the  type  AB-A.  This  is  the  natural 
generalization  of  the  twin  fixpoint.  In  the  most  general  case, 
any  class  can  refer  to  any  other;  thus,  w  must  expand  to  in¬ 
clude  all  classes.  This  is  the  technique  we  use  in  the  formal 
translation.  In  a  real  compiler,  we  would  analyze  the  reference 
graph  and  cluster  the  strongly-connected  classes  only.  Note 
that  this  only  addresses  the  typing  aspect;  mutual  recursion 
also  has  term-level  implications  (any  class  can  construct  ob¬ 
jects  of  or  downcast  to  any  other — see  section  4.3)  as  well  as 
interactions  with  privacy — see  section  5. 

This  completes  our  informal  account  of  the  object  en¬ 
coding;  we  now  turn  to  a  formal  translation  of  FJ  types. 
Figure  4  defines  several  functions  which  govern  the  layout  of 
fields  and  methods  in  object  types.  Square  brackets  [-]  de¬ 
note  sequences.  The  sequence  Si  -H-  S2  is  the  concatenation  of 
sequences  5i  and  S2.  |s|  denotes  the  number  of  elements  in  s. 
The  domain  of  a  sequence  of  pairs  dom(s)  is  a  set  consisting 
of  the  first  elements  of  each  pair  in  s. 

The  function  fieldvec  maps  a  class  name  C  to  a  sequence 
of  tuples  of  the  form  (f,  D),  indicating  a  field  of  type  D  named 
f — except  for  the  first  tuple  in  the  sequence,  which  is  always 
(vtab,  vt),  a  placeholder  for  the  vtable.  Each  class  simply 
appends  its  own  fields  onto  the  sequence  of  fields  from  its 
super  class.  (In  FJ,  the  fields  of  a  class  are  assumed  to  be 
distinct  from  those  of  its  super  classes.) 

The  layout  of  methods  in  an  object  type  is  somewhat 
trickier.  Methods  that  appear  in  a  class  definition  are  either 
new  or  they  override  methods  in  the  super  class.  Overriding 
methods  do  not  deserve  a  new  slot  in  the  vtable.  The  function 
methvec  maps  a  class  name  C  to  a  sequence  of  tuples  of  the 
form  (m,  T),  indicating  a  method  named  m  with  signature  T. 
Signatures  have  the  form  Di...D„->D.  The  helper  function 
addmeth  iterates  through  all  the  methods  defined  in  the  class 
C,  adding  only  those  methods  that  are  new.  The  first  tuple 
in  methvec  is  always  (dyncast,  dc),  a  pseudo-method  used  to 
implement  dynamic  casts. 

Let  cn  denote  the  set  of  class  names  in  the  program  of 
interest,  including  Obj.  We  abbreviate  the  kind  of  a  tuple  of 
all  object  types  as  ken.  The  tuple  of  row  kinds  for  class  C  is 
abbreviated  ktaillC], 

ken  =  {(E::  Type)  ^^“} 

ktail[C]  =  -[m  ::  f ::  ^d°ra(fieldvec{c))^ 

For  brevity,  we  sometimes  omit  kind  annotations.  By  con¬ 
vention,  certain  named  type  variables  are  bound  by  particular 
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fieldveciOhj)  =  [(vtab,  vf)] 


i?0W5[C,  C]  =  Aw.  Au.  \ta\\::ktaillC].  tail 


CT(C)  =  class  C  <  B  {Di  f  1 ;  . . .  f  „  ;  K . . .  } 
fieldvec{C)  =  fieldvec(B)  ++  [(f  i,  Di) . . .  {f„,  D^)] 


methveciOhj)  =  [(dyncast,  c?c)] 


CT(C)  =  class  C  <3 

methvec{C)  =  methvec(B)  ++  addmeth(B,  [Mi ...  M^]) 


(m,  _)  G  methvec(B) 

addmeth{B,  [D  m(Di  xi . . .  Dj;  x*;)  { . . .  }  M2  . . .  M,„])  = 
addmeth(B,  [M2  . . .  M,„]) 


i?ow'5[0bj,  T]  =  Aw.  Au.  Atail::fcta!7[0bj  ]. 

{m  =  Aself.  (dyncast :  self — > 

Va.  (u— >maybe  a)^  maybe  a; 
tail-m  self) 

f  =tail-f} 

CT(C)  =  class  C  <1  B  {Di  f  1  . . .  D„  f  „  K  Ml  . . .  M„,} 
addmeth(B,  [Mi . . .  M„])  =  [(k,  Tj) . . .  {l„,  T^)] 
i7oW5[B,  A]  =  T 

i7oM'5[C,  A]  =  Aw.  Au.  Atail::A:toz7[C]. 

T  w  u  {m  =  Aself.  (/i :  7)/[self;  w;  Ti] ;  ... 

Im  '■  7}'[self;  w;  T„]  ;tail-m  self), 
f  =  (fi :  w-Di ;  . . .  f„  :  w-D„  ;tail-f)} 


(m,  _)  ^  methvec(B) 

addmeth{B,  [D  m(Di  xi . . .  Dj:  x*;)  { . . .  }  M2  . . .  M„,])  = 
[(m,  Di . . .  Dj:->D)]  ++  addmeth(B,  [M2  . . .  M^j) 


addmeth(B,  [])  =  [] 


Figure  4:  Field  and  method  layouts  for  object  types. 


kinds — w  has  kind  ken,  self  and  u  have  kind  Type,  and  tail 
has  kind  ktail[C],  where  C  should  be  evident  from  the  context. 

In  figure  5  we  define  Rows,  a  type  operator  that  produces 
rows  containing  the  fields  and  methods  introduced  between 
two  classes  in  a  subclass  relationship.  Intuitively,  7Ioh's[C,  A] 
includes  fields  and  methods  in  class  C  but  not  in  its  ancestor 
class  A.  Earlier  we  described  how  to  package  dynamic  classes 
into  static  classes;  the  witness  type  was  a  tuple  of  rows  con¬ 
taining  the  fields  and  methods  in  the  dynamic  class  but  not 
in  the  static  class.  This  is  just  one  use  of  the  Rows  operator. 

The  type  operator  7Iows[C,  A]  expects  three  arguments:  w, 
the  tuple  containing  object  types  for  all  classes;  u,  a  universal 
type  used  to  implement  dynamic  casts;  and  tail,  a  tuple  of 
rows  containing  members  of  subclasses.  The  implementation 
of  dynamic  cast  will  be  explained  in  section  4.3.  For  now,  we 
only  observe  that  the  macros  in  figure  5  simply  propagate  u  so 
that  it  can  appear  in  the  type  of  the  dyncast  pseudo-method. 

ilowsjC,  A]  is  defined  by  three  cases.  First,  if  C  and  A  are 
the  same  class,  then  the  result  is  just  the  tail — those  members 
in  subclasses  of  C.  Second,  if  C  is  Obj  (the  root  of  the  class 
hierarchy)  and  A  is  the  special  symbol  T  then  the  result  is 
the  members  declared  in  Obj.  Treating  T  as  the  trivial  su¬ 
per  class  of  Obj  permits  more  uniform  specifications  (since 
Obj  contains  members  of  its  own).  Finally,  in  the  inductive 
case  (where  C  < :  A)  we  look  to  C’s  super  class — let’s  call  it 
B.  iIows[B,  A]  produces  a  type  operator  for  the  members  be¬ 
tween  B  and  A;  we  need  only  append  the  new  members  of  C. 
Conveniently,  TJowsjB,  A]  has  a  tail  parameter  specifically  for 
appending  new  members. 

The  new  fields  in  C  are  precisely  those  listed  in  the  dec¬ 
laration  of  C;  we  fetch  their  types  from  w  and  append  tail-f. 


7y[self;  w;  Di . . .  D„->D]  =  self — >w-Di — > . . .  w-D„ — >w-D 


Empty[C] 
Obj  Red  [C] 


SelfTylC] 

ObjTy[C] 

World 


{m  =  Aself 
f  =  j. 

Aw.  Au.  Atail.  Aself 

{vtab :  {(T^owsje,  T]  w  u  tail)-m  self}  ; 
(TJowsje,  T]  w  u  tail)-f } 

Aw.  Au.  Atail. /iself  ObjRcd[C]  w  u  tail  self 
Aw.  Au.  3tail.  SeljTy[C]  w  u  tail 
Au.  fiw.  {(E  =  ObjTylE]  w  u)  } 


Figure  5:  Macros  for  object  types. 


The  new  methods  in  C  are  found  using  addmeth,  and  their 
type  signatures  Di . . .  D„->D  are  translated  to  arrow  types 
self— >w-Di — >...w-D„ — >w-D.  We  use  curried  arguments  for 
convenience;  an  implementation  would  use  multi-argument 
functions  instead.  As  shown  in  the  informal  examples,  the 
row  for  methods  is  parameterized  by  the  type  of  self 

Also  in  figure  5,  we  use  the  Rows  operator  to  de¬ 
fine  macros  for  several  variants  of  the  object  type  for  any 
given  class.  EmptylC]  denotes  the  tuple  of  empty  field  and 
method  rows  of  kind  ktaillC],  ObjRcdlC]  assembles  the  rows 
into  records,  leaving  the  subclass  rows  and  self  type  open. 
SelfTylC]  closes  self  with  a  fixpoint,  and  ObjTy[C]  hides  the 
sublass  rows  with  an  existential.  Each  of  these  variants  is 
used  in  our  term  translation.  All  of  them  remain  abstracted 
over  both  w  (the  types  of  other  objects)  and  u  (the  universal 
type,  which  is  simply  propagated  into  the  type  of  dyncast). 
Einally,  World  constructs  a  package  of  the  types  of  objects  of 
all  classes,  given  the  universal  type  u;  as  we  will  see  later,  the 
actual  universal  type  is  a  labeled  sum  of  object  types,  and  is 
defined  recursively  using  World. 

4.2  Expression  translation 

Equipped  with  an  efficient  object  encoding  and  several  type 
operators  for  describing  it,  we  now  examine  the  type-directed 
translation  of  FJ  expressions.  Figure  6  contains  term  macros 


5 


EXP [F;  u;  classes;  x]  =x 


(var) 


r  h  e  G  D  Exp[r;  u;  classes;  e]  =  e 
D<:C  upcast[D; C;  u; e]  =  e' 


(upcast) 


(f ,  _)  £  fieldvec(C) 

r  h  e  G  C  EXP  [F;  u;  classes;  e]  =  e 
exp[F;  u;  classes;  e .  f]  = 

open  unfold  e  as  World  u  at  C 

as  (tail,  x-.SelfTylC]  (World  u)  u  tail) 

in  (unfold  x).f 


(field) 


(m,  Bi . . .  B„->B)  G  methvec(C) 

F  h  e  G  C  exp[F;  u;  classes;  e]  =  e 

F  h  e;  G  D;  exp[F;  u;  classes;  e,]  =  e,-  ) 

Di<:Bi  UPCAST[Di;Bi;u;ei]  =  e' 

exp[F;  u;  classes;  e  .m(ei ...  e„) ]  = 
open  unfold  e  as  World  u  at  C 

as  (tail,  x'.SelfTylC]  (World  u)  u  tail) 
in  (unfold  x).vtab.m  x  e[  ...  e'„ 

(invoke) 


fields(C)  =  Bi  f  1 . . .  B„  f„ 

F  h  e;  G  D;  exp[F;  u;  classes;  e,]  =  e,-  I 
D,  <:Bi  UPCAST[D,;Bi;u;ei]  =  e' 

exp[F;  u;  classes;  new  C(ei . . .  e„)  ]  = 
(classes. C  {}).new  e[  ...  e'„ 


(new) 


EXP [F;  u;  classes;  (C)e]  =  e' 


FheGD  C<:D  EXP  [F;  u;  classes;  e]  =  e 
EXP [F;  u;  classes;  (C)e]  = 

open  unfold  e  as  World  u  at  C 

as  (tail,  x'.SelfTylC]  (World  u)  u  tail) 
in  case  (unfold  x).vtab.dyncast  x 
[(World  u)-C] 

(classes.C  {}).proj 
of  some  y  ^  y 

else  jClassCast  error! 

(dncast) 


Macros  for  pack  and  upcast  transformations: 

PACK[C;u;tail;e]  = 

fold  (tail'::/ctoi7[C]  =tail, 

e:Selfry[C]  (World  u)  u  tail') 
as  World  u  at  C 

upcast[C;A;  u;e]  = 

open  unfold  e  as  World  u  at  C 

as  (tail,  X'.SelfTylC]  (World  u)  u  tail) 
in  pack[A;  u;f?otV5[C,  A]  (World  u)  u  tail;x] 


Figure  6:  Type-directed  translation  of  FJ  expressions. 


pack  and  upcast  and  six  rules  governing  the  judgment 
EXP [F;  u;  classes;  e]  =  e  for  term  translation.  F  is  the  FJ  type 
environment,  u  is  the  universal  sum  type,  classes  is  a  record 
containing  the  runtime  representations  of  each  class,  e  is  an 
FJ  expression,  and  e  is  its  corresponding  term  in  the  target 
language.  If  e  has  type  C,  then  its  translation  e  should  have 
type  (World  u)-C. 

The  PACK  macro  packages  and  folds  an  open-self  term 
into  a  closed,  complete  object  type  in  mutual  fixpoint  with 
all  others.  Supposing  that  tail  is  some  row  tuple  in  ktail[C] 
and  e  has  type  (SelfTylC]  w  u  tail),  the  term  PACK[C;u;tail;e] 
has  type  w-C.  The  upcast  macro  unfolds  and  repackages  an 
object  term  to  a  term  of  some  super  class.  When  C  < :  A  and  e 
has  type  w-C,  the  term  upcast[C;  A;  u;e]  has  type  w-A.  These 
macros  simply  and  effectively  formalize  the  encoding  tech¬ 
niques  demonstrated  in  the  previous  section.  They  employ 
erasable  type  manipulations  only.  Note  the  use  of  flowsjC,  A] 
as  the  new  witness  type  in  upcast. 

We  now  explain  each  of  the  translation  rules  in  figure  6, 
beginning  with  (var).  Variables  in  FJ  are  bound  as  method 
arguments.  Methods  are  translated  as  curried  abstractions 
binding  the  same  variable  names.  Therefore,  variable  trans¬ 
lation  (var)  is  trivial.  An  upcast  expression  (C)e  (where 
FheGD  and  D  <:  C)  is  also  trivial;  the  rule  (upcast) 
delegates  its  task  to  the  macro  of  the  same  name. 

The  field  selection  expression  e .  f  translates  to  an  unfold- 
open-unfold-select  idiom  in  the  target  language  (field).  In 
this  sequence,  the  select  alone  has  runtime  effect.  Method  in¬ 


vocation  e  .m(ei . . .  e„)  augments  the  idiom  with  applications 
to  self  and  the  other  arguments,  but  there  is  one  complica¬ 
tion.  The  FJ  typing  rule  permits  the  actual  arguments  to  have 
types  that  are  subclasses  of  the  types  in  the  method  signature. 
Since  our  encoding  does  not  utilize  subtyping,  the  function 
selected  from  the  vtable  expects  arguments  of  precisely  the 
types  in  the  method  signature.  Therefore,  we  must  explic¬ 
itly  upcast  all  arguments.  Rule  (invoke)  formalizes  the  self 
application  technique  demonstrated  earlier. 

The  code  to  create  a  new  object  of  class  C  essentially  se¬ 
lects  and  applies  C’s  constructor  from  the  classes  record.  Un¬ 
til  we  explain  class  encoding  and  linking,  the  type  of  classes 
will  be  difficult  to  justify.  Presently  it  will  suffice  to  say  that 
classes.C  applied  to  the  unit  value  {}  returns  a  record  which 
contains  a  function  new — the  constructor  for  class  C.  The 
translation  (new)  upcasts  all  the  arguments,  then  fetches  and 
applies  the  constructor. 

The  final  case,  dynamic  casts,  may  appear  quite  mag¬ 
ical  until  we  reveal  the  implementation  of  the  dyncast 
pseudo-method  in  the  next  section.  For  now  it  is  enough 
to  treat  dyncast  as  a  black  box — a  polymorphic  function 
with  type  Va.  (u— >maybe  a)^  maybe  a.  The  argument  of 
dyncast  [ObjTy[C]  w  u]  is  a.  projection  function,  attempting  to 
convert  a  value  of  type  u  to  an  object  of  type  ObjTy[C]  w  u. 
In  addition  to  the  new  function,  the  classes  record  contains  a 
pro]  field  for  each  class  C,  of  type  u— >maybe  (ObjTy[C]  w  u). 
Thus  if  we  select  the  dyncast  method  from  an  object,  instan¬ 
tiate  it  with  the  object  type  for  some  class  C,  then  pass  it  the 
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projection  for  class  C,  it  will  return  some  C  object  if  the  cast 
succeeds,  or  none  if  it  fails.  In  case  of  failure,  evaluation  gets 
stuck — ^just  as  it  does  in  FJ.  In  full  Java,  we  would  throw  a 
ClassCast  exception. 

The  expression  translation  judgment  exp  preserves  types. 
Informally,  if  e  has  type  C  then  its  translation  has  type 
(World  u)-C  (for  some  type  u).  To  state  this  property  for¬ 
mally,  we  must  first  translate  all  the  types  in  the  FJ  typing 
environment  T: 

£«v[u;  r,  X  :  D]  =  £«v[u;  T],  x :  (World  u)-D 
£mv[u;  o]  =  o 

By  inspection,  it  is  easy  to  show  that  £mv[u;  F]  is  a  well- 
formed  environment,  assuming  that  the  range  of  F  is  a  subset 
of  cn.  The  type  preservation  theorem  and  a  proof  sketch  fol¬ 
low;  for  more  detail,  please  consult  the  companion  technical 
report. 

Theorem  1  (type  preservation)  If  h  u  ::  Type, 

A  f-  classes :  {Classes  (World  u)}  and  F  h  e  G  C  then 
A,  £nv[u;  F]  h  exp[F;  u;  classes;  e] :  (World  u)-C. 

Proof:  by  induction  on  the  structure  of  e.  All  cases  are 
straightforward  if  we  factor  out  and  prove  several  proper¬ 
ties  as  lemmas.  First,  we  must  establish  a  correspondence 
between  the  fields  used  in  the  FJ  semantics  and  the  fieldvec 
relation  used  for  object  layout  (likewise  between  mtype  and 
methvec).  Second,  we  must  establish  the  correspondence  be¬ 
tween  pairs  in  fieldvec  (or  methvec)  and  elements  in  Rows.  All 
these  correspondences  are  proved  by  induction  on  the  class 
hierarchy.  Finally,  we  must  show  that  the  pack  and  upcast 
macros  return  expressions  of  the  expected  type.  These  can  be 
proved  by  inspection,  but  the  latter  argument  requires  a  non¬ 
trivial  coherence  property  for  Rows.  Specifically,  the  compo¬ 
sition  RowslA,  T]  w  u  (PotvsJC,  A]  w  u  tail)  must  be  equivalent 
to  Pows[C,  T]  w  u  tail.  This  is  proved  by  induction  on  the 
derivation  of  C  < :  A.  □ 

4.3  Class  encoding 

Apart  from  defining  types,  classes  in  FJ  serve  three  other 
roles:  they  are  extended,  invoked  to  create  new  objects,  and 
specified  as  targets  of  dynamic  casts.  In  our  translation,  each 
class  declaration  is  separately  compiled  into  a  module  export¬ 
ing  a  record  with  three  elements — one  to  address  each  of 
these  roles.  We  informally  explain  our  techniques  for  imple¬ 
menting  inheritance,  constructors,  and  dynamic  casts,  then 
give  the  formal  translation  of  class  declarations. 

In  a  class-based  language,  each  vtable  is  constructed  once 
and  shared  among  all  objects  of  the  same  class.  In  addition, 
methods  inherited  by  subclasses  should  be  shared.  How  might 
we  implement  the  Point  methods  so  that  they  can  be  packaged 
with  a  ColorPoint?  We  make  the  method  record  polymorphic 
over  the  tail  of  the  self  type: 

dictPT  =  Atail::A:toz7[PT]. 

{getx  =  Aself :  Spt.  (unfold  selfj.x} 

where  Spt  =  pa.  {vtab :  {getx :  a — >int ;  tail-m  a}  ; 

X :  int ;  tail-f} 

We  call  this  polymorphic  record  a  dictionary.  By  instantiating 
it  with  different  tails,  we  can  directly  package  its  contents 


DictlC] 

=  Aw.  Au.  Aself. 

{(ilows[C,  T]  w  u  EmptylC]) 

Ctor[C] 

=  Aw.  w-Di — > . . .  w-D„ — >w-C 

where  fields(C)  =  Di  fj . . .  D„  f„ 

ProjlC] 

=  Aw.  Au.  u — >maybe  w-C 

InjlC] 

=  Aw.  Au.  w-C — >u 

Classic] 

=  Aw.  Au. 

{diet :  Vtail.  DicfJC]  w  u  (SelfTylC]  w  u  tail), 
pro]  :ProjlC]  w  u, 
new :  Cfor[C]  w  } 

Classes  =  Aw.  Au.  ((E:  1— >Clt!Ss[E]  w  u;)^^“  Abs“) 
ClassFlC]  =  Vu. /fijjc]  (World  u)  u— >Proj[C]  (World  u)  u— > 
{Classes (World  u)  u}— > 

1— >C/as5[C]  (World  u)  u 

Tagged  =  Au.  [(C :  ObjTylC]  (World  u) 

Figure  7:  Macros  for  dictionary,  constructor,  and  class  types. 


into  objects  of  subclasses.  Instantiated  with  empty  tails  (e.g., 
Empty[PT]),  this  dictionary  becomes  a  vtable  for  class  Point. 
Suppose  the  ColorPoint  subclass  inherits  getx  and  adds  a 
method  of  its  own.  Its  dictionary  would  be: 

dictCP=  Atail::A:toiZ[CP]. 

{getx=  (dietPT  [rcp]).getx, 
getc  =  Aself :  Sq,.  (unfold  selfj.c} 

where  =  fJowsjCP,  PT]  (World  u)  u  Empty[CP] 
and  Sep  =  pa.  {vtab : {getx: a — >int ; 

getc :  a — > color ;  tail-m  a}  ; 
x:int;c:color;tailf} 

Again,  this  dictionary  can  be  instantiated  with  empty  tails 
to  produce  the  ColorPoint  vtable.  With  other  instantiations, 
further  subclasses  can  inherit  either  of  these  methods.  The 
dictionary  is  labeled  diet  in  the  record  exported  by  the  class 
translation. 

Constructors  in  FJ  are  quite  simple;  they  take  all  the  fields 
as  arguments  in  the  correct  order.  Fields  declared  in  the  su¬ 
per  class  are  immediately  passed  to  the  super  initializer.  We 
translate  the  constructor  as  a  function  which  takes  the  fields  as 
curried  arguments,  places  them  directly  into  a  record  with  the 
vtable,  and  then  folds  and  packages  the  object.  The  construc¬ 
tor  function  is  labeled  new  in  the  class  record.  In  section  5, 
we  describe  how  to  implement  more  realistic  constructors. 

Implementing  dynamic  cast  in  a  strongly-typed  language 
is  challenging.  Somehow  we  must  determine  whether  an  arbi¬ 
trary,  abstractly-typed  object  belongs  to  a  particular  class.  If 
it  does  belong,  we  must  somehow  refine  its  type  to  reflect  this 
new  information.  Exception  matching  in  SML  poses  a  simi¬ 
lar  problem.  To  address  these  issues.  Harper  and  Stone  [15] 
introduce  tags — values  which  track  type  information  at  run¬ 
time.  If  a  tag  of  abstract  type  Tag  a  equals  another  tag  of 
known  type  Tag  t,  then  we  update  the  context  to  reflect  that 
a  =  T.  Note  that  this  differs  from  intensional  type  analy¬ 
sis  [14],  which  performs  structural  comparison  and  does  not 
distinguish  named  types. 

Tags  work  well  with  our  encoding;  in  an  implementation 
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Class  declaration  translation: 


Constructor  code: 


CDEC[C]  = 

Au::Type.  Ainj  ilnjlc]  (World  u)  u.  Aproj  :Proj[C]  (World  u)  u. 
Aclasses :  {Classes  (World  u)  u}.  A_:  1. 
let  d\ct:yta\\::ktail[C].  Dict[C]  (World  u)  u 

(SelfTylC]  (World  u)  u  tail) 

=  dict[C;  u;  inj;  classes] 
in  let  vtab  =  dict  lEmpty[C]] 

in  {diet  =  diet,  proj  =  proj,  new  =  new[C;  u;  vtab]} 


Dictionary  construction: 

DiCT[0bj;  u;  inj;  classes]  = 

Atail::A:tofZ[Obj ].  {dyncast  = 

Aself :  SeZ/2}'[C]  (World  u)  u  tail. 
Aa::Type.  Aproj :  u — >maybe  a. 
proj  (inj  PACK[Obj;u;tail;self])} 

Cr(C)  =  class  C  <3  B 
dom(methvec(C))  =  [h...  In] 
dict[C;  u;  inj;  classes]  = 

Atail::A:toz7[C]. 

let  super :Dzcf[B]  (World  u)  u 

(SelfTylC]  (World  u)  u  tail) 

=  (classes.B  {}).dict 

[JJowsjC,  B]  (World  u)  u  tail] 
in  {Zi  =  meth[C;  Zi;  u;  tail;  inj;  classes;  super], . . . , 

Z„  =  meth[C;  Z„;  u;  tail;  inj;  classes;  super]} 


fields(C)  =  Di  f  1 . . .  D„  f  „ 

NEw[C;  u;  vtab]  = 

Af  1 :  (World  u)-Di.  . . .  Af„  :  (World  u)-D„. 
let  X  =  fold  {vtab  =  vtab,  fi=fi,  = 

as  SelfTylC]  (World  u)  u  EmptylC] 
in  pack[C;  u;Zimpfy[C];x]} 

Method  code: 

meth[C;  dyncast;  u;  tail;  inj;  classes;  super]  = 
\se\f :  SelfTylC]  (World  u)  u  tail. 

Aa::Type.  Aproj :  u — >maybe  a. 
case  proj  (inj  pack[C;u; tail; self]) 
of  some  X  =>  some  [a]  x, 

else  super.dyncast  self  [a]  proj 

Cr(C)  =  class  C  <3  B  {  . . .  K  Ml  . . .  M„} 
m  not  defined  in  Mi ...  M„ 

METH[C;m;  u;  tail;  inj;  classes;  super]  =  super.m 

Cr(C)  =  class  C  <3  B  {  . . .  K  Ml  . . .  M„} 

3j  :  Mj  =  A  m(Ai  Xi  . . .  A„,  x„,)  {*e;> 
r  =  Xi:Ai, . . . ,  Xyn'.kjn,  this:C 
ri-eGD  D<:A 
Exp[r;  u;  classes;  e]  =  e 
METH[C;m;  u;  tail;  inj;  classes;  super]  = 

Aself : SeZ/7y[C]  (World  u)  u  tail. 

Axi :  (World  u)-Ai.  . . .  Ax^  :  (World  uj-A^. 
let  this  :  (World  u)-C  =  pack[C;  u;  tail;  self] 
in  uPCAST[D;A;u;e] 


Figure  8:  Translation  of  class  declarations. 


that  supports  assignment  and  an  SML  front-end,  it  may  be 
a  good  choice.  In  this  formal  presentation,  however,  type 
refinement  complicates  the  soundness  proof  and  the  impera¬ 
tive  nature  of  maketag  constrains  the  operational  semantics, 
which  is  otherwise  free  of  side  effects,  maketag  implements  a 
dynamically  extensible  sum,  which  is  needed  for  SML  excep¬ 
tions,  but  is  overkill  for  classes  in  FJ. 

We  propose  a  simpler  approach,  which  co-opts  the  dy¬ 
namic  dispatch  mechanism.  The  viable  itself  provides  a  kind 
of  runtime  class  information.  A  designated  method,  if  over¬ 
ridden  in  every  class,  could  return  the  receiver  at  its  dynamic 
class  or  any  super  class.  We  just  need  a  runtime  representa¬ 
tion  of  the  target  class  of  the  cast,  and  some  way  to  connect 
that  representation  to  the  corresponding  object  type.  For  this, 
we  can  use  the  standard  sum  type  and  a  ‘one-armed’  case. 
Let  M  be  a  sum  type  with  a  variant  for  each  class  in  the  class 
table.  The  function 

Ax :  u.  case  x  of  C  y  =>  some  lObjTylC]  w  u]  y 
else  none  lObjTylC]  w  u] 

could  dynamically  represent  class  C.  To  connect  it  to  the 
object  type,  we  make  the  dyncast  method  polymorphic,  with 
the  type 

self— >Vq:.  (m— >maybe  a) — >maybe  a 


This  method  can  check  its  own  class  against  the  target  class 
by  injecting  self  and  applying  the  function  argument.  If  the 
result  is  none,  then  it  tries  again  by  injecting  as  the  super 
class,  and  so  on  up  the  hierarchy. 

With  this  solution,  we  must  be  careful  to  preserve  sepa¬ 
rate  compilation — the  universal  type  u  includes  a  variant  for 
every  class  in  the  program.  Fortunately,  in  a  particular  class 
declaration  we  need  only  inject  objects  of  that  class.  Class 
declarations  can  treat  u  as  an  abstract  type  and  take  the  in¬ 
jection  function  as  an  argument.  Then  only  the  linker  needs 
to  know  the  concrete  u  type. 

We  now  explore  the  formal  translation  of  class  declarations 
and  construction  of  their  method  dictionaries.  In  figure  7 
we  define  several  macros  for  describing  dictionary  and  class 
types.  Figure  8  gives  translations  for  each  component  of  the 
class  declaration. 

Each  class  is  separately  compiled  to  code  that  resembles 
an  SML  functor — a  set  of  definitions  parameterized  by  both 
types  and  terms.  Linking — the  process  of  instantiating  the 
separate  functors  and  combining  them  into  single  coherent 
program — will  be  addressed  in  the  next  section. 

CDEc[C]  produces  the  functor  corresponding  to  class  C; 
see  the  definition  in  the  top  left  of  figure  8.  The  code  has 
one  type  parameter:  u,  the  universal  type  used  for  dynamic 


PROG[e]  = 

let  Xc„  =  LINK  {(C  =  CDEC[C]) 
in  EXp[o;  u;Xc„;  e] 

where  u  =  //u::Type.  Tagged  u 

LINK  =  Ax:{(C:C/flssF[C])^‘=‘^"}. 
fix  [Classes  (World  u)  u] 

(Aclasses  :  {Classes  (World  u)  u). 

{(C  =  x.C  [u]  injc  projc  classes) 
where  u  =  //u::Type.  Tagged  u 

injc  =  :  ObjTylC]  (World  u)  u.  fold  injc”*'^*'^  “  x  as  u 

projc  =  \x:u.  case  unfold  x 

of  C  y  =>  some  [ObjTy[C]  (World  u)  u]  y 
else  none  [ObjTy[C]  (World  u)  u] 

Figure  9:  Program  translation  and  linking. 


casts.  Following  it  are  two  function  parameters  for  injecting 
and  projecting  objects  of  class  C.  The  next  parameter  is 
classes,  a  record  containing  definitions  for  other  classes  that 
are  mutually  recursive  with  C  (for  convenience,  we  assume 
that  each  class  refers  to  all  the  others).  The  final  parameter 
is  of  unit  type;  it  simply  delays  references  to  classes  so  that 
linking  terminates. 

In  the  functor  body,  we  define  diet  (using  the  macro  dict) 
and  vtab  (the  trivial  instantiation  of  dict).  dict  is  placed  in  the 
class  record  (so  subclasses  can  inherit  its  methods);  vtab  is 
passed  to  the  new  macro  which  creates  the  constructor  code. 
The  constructor  is  exported  so  that  other  classes  can  create 
C  objects;  and,  finally,  the  projection  function  proj  (a  functor 
parameter)  is  exported  so  other  classes  can  dynamically  cast 
to  C. 

The  dictionary  for  class  Obj  is  hard-coded  as 
DicxjObj; . . .  ].  Its  dyncast  method  injects  self  at  class  Obj, 
passes  this  to  the  proj  argument  and  returns  the  result.  If  the 
class  tags  do  not  match,  dyncast  indicates  failure  by  returning 
none;  there  is  no  super  class  to  test.  For  all  other  classes, 
DICT  fetches  the  super  class  dictionary  from  classes  and  in¬ 
stantiates  it  as  super.  It  then  uses  meth  to  construct  code  for 
each  method  label  in  methvec. 

METH  supports  three  cases:  it  (1)  produces  the  dyncast 
method  (which  must  be  overridden  in  every  class),  (2)  inher¬ 
its  a  method  from  the  super  class,  or  (3)  constructs  a  new 
method  body  by  translating  FJ  code. 

Theorem  2  (Well-typed  class  declaration) 

$;  A  h  CDEc[C] :  CiassTjC] 

Proof:  by  inspection.  □ 

4.4  Linking 

The  final  task:  instantiate  and  link  the  separate  class  modules 
together  into  a  single  program.  Figure  9  gives  the  translation 
for  a  complete  FJ  program.  The  link  function  creates  a 
record  of  classes  from  a  record  of  the  class  functors.  The 
result  is  bound  to  Xen  and  used  as  the  classes  parameter  in 
translating  the  main  program  expression  e. 

link  uses  fix  to  create  a  fixpoint  of  the  record  of  classes. 
Each  class  functor  in  x  has  one  type  parameter  and  three 


value  parameters.  Tagged  was  defined  in  figure  7  as  a  param¬ 
eterized  sum  type  with  a  variant  for  the  object  type  of  each 
class  in  the  class  table.  We  instantiate  each  x.C  with  the  fixed 
point  of  Tagged.  Next  we  pass  the  injection  and  projection 
functions,  injc  and  projc.  The  final  argument  to  x.C  is  the 
classes  record  itself 

Theorem  3  (Well-typed  linkage) 

4>;  A  h  LiNK:{(E:C/assE[E])^^“}— >{CZasses  (World  u)  u} 
where  u  =  /tu::Type.  Tagged  u 

Proof:  by  inspection.  □ 

5  Extensions 

Our  encoding  and  translation  strategy  extend  to  support  a 
significant  subset  of  Java.  Features  which  require  little  ad¬ 
ditional  effort  include  null  references  (with  maybe  types), 
assignment  (with  mutable  records),  multiple  parameterized 
constructors  (by  adding  them  to  the  class  record),  super  calls 
(as  used  in  dyncast),  and  exceptions  (as  in  SML). 

In  [18]  we  ambitiously  supported  Java  interfaces  using 
views.  To  cast  an  object  to  an  interface  type,  we  fetch  a  pre¬ 
computed  view  from  the  vtable  and  pair  the  object  with  it. 
Thereafter,  interface  method  calls  are  no  more  expensive  than 
virtual  method  calls.  This  technique  works  well  with  mutual 
recursion  and  dynamic  casts  (even  dynamic  casts  to  interface 
types),  but  we  omit  it  because  interfaces  significantly  com¬ 
plicate  the  formal  presentation,  including  the  source  language 
semantics  and  type  preservation  proofs. 

Another  feature  we  supported  in  [18]  is  privacy — each 
class  used  an  existential  to  hide  the  types  of  its  own  private 
fields.  Thus  privacy  is  preserved  by  the  translation:  link-time 
type  checking  will  prevent  any  other  module  from  accessing 
the  private  fields  of  a  class — even  if  the  module  was  translated 
from  a  dilferent  source  language. 

Unfortunately,  privacy  interacts  badly  with  mutual  recur¬ 
sion.  Suppose  that  A  has  a  private  field  b  of  class  B  and  that 
B  has  a  method  geta  that  returns  an  object  of  class  A.  From 
within  class  A,  accessing  this.b  is  allowed,  as  is  invoking 
this.b.getaO .  It  is  more  difficult  to  design  an  encoding 
that  also  allows  this.b.getaO  .b.  Using  the  existential  in¬ 
terpretation  of  privacy  from  [18],  each  class  has  its  own  view 
of  the  types  of  all  other  objects.  From  within  class  A,  private 
fields  of  other  objects  of  class  A  are  visible.  Private  fields  of 
objects  of  other  classes  are  hidden,  represented  by  type  vari¬ 
ables.  In  our  example,  this.b  would  have  a  type  something 
like  “B  with  private  fields  (3”  where  f3  is  the  abstract  type. 
Likewise,  from  within  class  B,  the  type  of  method  geta  might 
be  self— >(“A  with  private  fields  a”).  The  challenge  is  to  allow 
class  A  to  see  that  the  a  in  the  type  of  geta  is  actually  the 
known  type  of  its  own  private  fields. 

Propagating  this  information  is  especially  tricky  given  the 
weaknesses  of  the  iso-recursive  types  used  in  our  target  cal¬ 
culus.  We  have  developed  a  solution  which  does  not  require 
extending  the  target  calculus.  Briefly,  we  need  to  parameter¬ 
ize  everything  (including  the  hidden  type  itself)  by  the  types 
of  objects  of  other  classes.  Then,  each  class  can  instantiate 
the  types  of  the  rest  of  the  world  using  concrete  types  for  its 
own  private  fields  (wherever  they  may  lurk  in  other  classes) 
and  abstract  types  for  the  rest.  Unfortunately,  the  issues  are 
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subtle  and  a  detailed  explanation  would  go  out  of  the  scope 
of  the  current  paper.  We  are  considering  extending  FJ  itself 
with  privacy  in  order  to  formalize  our  argument. 

We  are  also  actively  working  on  other  extensions.  In  the 
original  Featherweight  Java  paper,  for  example,  Igarashi  et 
al.  formalize  Generic  Java  (GJ)  [3]  and  translate  it  back 
to  FJ  by  erasing  type  parameters  and  adding  dynamic  casts. 
With  at  most  a  minor  extension  to  our  target  language  type 
system,  we  should  be  able  to  translate  GJ  without  resorting 
to  dynamic  casts. 

6  Related  work 

Fisher  and  Mitchell  [10]  use  extensible  objects  to  model  Java¬ 
like  class  constructs.  Our  encoding  does  not  rely  on  exten¬ 
sible  objects  as  primitives,  but  it  may  be  viewed  as  an  im¬ 
plementation  of  some  of  their  properties  in  terms  of  simpler 
constructs.  Remy  and  Vouillon  [28]  use  row  polymorphism 
in  Objective  ML  for  both  class  types  and  type  inference  on 
unordered  records.  Our  calculus  is  explicitly  typed,  but  we 
use  ordered  rows  to  represent  the  open  type  of  self. 

Our  object  representation  is  superficially  similar  to  several 
of  the  classic  encodings  in  F^^-based  languages  [5,  26].  As  in 
the  Abadi,  Cardelli,  and  Viswanathan  encoding  [2],  method 
invocation  uses  self-application;  however,  we  hide  the  actual 
class  of  the  receiver  using  existential  quantification  over  row 
variables  instead  of  splitting  the  object  into  a  known  interface 
and  a  hidden  implementation.  This  allows  reuse  of  methods 
in  subclasses  without  any  overhead.  We  use  an  analog  of  the 
recursive-existential  encoding  due  to  Bruce  [4]  to  give  types 
to  other  arguments  or  results  belonging  to  the  same  class  or  a 
subclass,  as  needed  in  Java,  without  over-restricting  the  type 
to  be  the  same  as  the  receiver’s. 

Several  other  researchers  have  described  type-preserving 
compilation  of  object-oriented  languages.  Wright,  et  al.  [32] 
compile  a  Java  subset  to  a  typed  intermediate  language,  but 
they  use  unordered  records  and  resort  to  dynamic  type 
checks  because  their  system  is  too  weak  to  type  self  appli¬ 
cation.  Crary  [7]  encodes  the  object  calculus  of  Abadi  and 
Cardelli  [1]  using  existential  and  intersection  types  in  a  cal¬ 
culus  of  coercions.  His  object  encoding  has  some  of  the  same 
benefits  as  ours,  though  the  coercion  calculus  is  a  significant 
departure  from  Glew  [12]  translates  a  simple  class-based 
object  calculus  into  an  intermediate  language  with  F-bounded 
polymorphism  [6,  9]  and  a  special  ‘self’  quantifier:  a  more 
complex  and  ad-hoc  target  calculus.  The  present  work  is 
a  significant  extension  and  simplification  of  the  preliminary 
results  we  reported  in  [18]. 

We  present  a  more  detailed  comparison  of  Glew,  Crary, 
and  our  own  encoding  in  a  forthcoming  technical  report  [19]. 
Briefly,  Glew’s  self  quantifier  self  a. 1(a)  is  equivalent  to  an 
encoding  based  on  an  F-bounded  existential:  3a  <  1(a).  a, 
where  1(a)  is  the  type  of  a  record  of  methods,  with  a  as  the 
type  of  each  method’s  first  argument.  This  connection  was 
independently  discovered  by  Glew  and  ourselves  [personal 
communication,  August  2000].  Self  application  is  typable  in 
this  encoding  because  the  object,  via  subsumption,  enjoys  two 
types:  the  interface  type  1(a)  and  the  abstract  type  a.  Crary 
encodes  precisely  the  same  property  as  an  intersection  type: 
3a.  a  A  1(a).  Similarly,  our  encoding  is  derived  by  replac¬ 
ing  the  F-bound  with  a  higher-order  bound  and  a  recursive 


type,  implementing  the  bound  as  a  coercion  function,  and 
then  eliminating  the  coercion  using  row  polymorphism.  All 
three  of  these  encodings  are  efficient  and,  we  conjecture,  fully 
abstract.  (Crary’s  informal  argument  [7]  seems  to  apply  to 
all  three  encodings,  though  no  proof  has  been  given  for  any 
of  them.)  The  primary  differences  between  these  encodings 
are  in  the  complexity  required  of  the  target  calculi.  In  scal¬ 
ing  them  to  realistic  compilers  and  source  languages,  other 
differences  may  emerge. 

7  Conclusion 

We  have  developed  an  efficient  encoding  of  key  Java  con¬ 
structs  in  a  simple,  implementable  typed  intermediate  lan¬ 
guage.  The  encoding,  after  type  erasure,  has  the  same 
operational  behavior  as  a  standard  implementation  of  self¬ 
application.  Our  strategy  extends  naturally  to  a  significant 
subset  of  Java.  In  comparison  to  our  earlier  work,  we  now 
support  mutual  recursion  and  dynamic  cast  while  retaining 
separate  compilation.  The  formal  translation  using  Feather¬ 
weight  Java  allows  comprehensible  type-preservation  proofs 
and  serves  as  a  starting  point  for  extending  the  translation 
to  new  features.  We  have  already  started  implementing  this 
translation  as  a  new  front-end  to  the  SML/NJ  compiler. 
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A  Featherweight  Java  semantics 

Syntax: 


Subtyping: 


CL  ::=  class  C  <  C  {(C  f  ;)*  K  M*> 

K  ::=  C((C  f  )* )  {super  (f* ) ;  (this .  f  =  f ;  )*]■ 

M  ::=  C  m((C  x)*)  {"e;} 
e  ::=  X  I  e.f  |  e.m(e*)  [  new  C(e*)  [  (C)e 

Field  lookup: 

fields(Ohj)  =  • 

CT(C)  =  class  C  <  B  {Cl  f  1 ;  . . .  C„  f  „  ;  K . . .  } 

fields(B)  =  Bi  gi  ■ . .  _ 

Ji6ld${C')  Bi  gi . . .  B^  g^,  Cl  f  1 . . .  C^  f  ^ 


C  <:  C 


CT(C)  =  class  C<]B{...}  B<:A 
C  <:  A 

Class  typing: 

K  —  C(Bi  gi . . .  Bji  g^,  Cl  f  1  . . .  C^  fm) 

{super  (gi...g„); 
this .f 1  =  f  1 ; . . .  this . f ^  =  f  „  ;  F 
fields(B)  =  Bi  gi  . . .  B„  g„ 

M;  ok  in  C  Vi  €  {1 ...  fc} 
class  C  <1  B  {Cl  f  1 ;  . . .  Cm  f  m ;  K  Mi . . .  Mj:F  ok 


Method  lookup: 

Cr(C)  =  class  C  <1  B  {  . . .  K  Ml . . .  M„> 
3j  :  Mj  =  D  m(Di  Xi  . . .  x^)  {"e;} 

mtype(m,  C)  =  Di . . .  Dm->D 
mbodyim,  C)  =  (xi . . .  x^,  e) 


CT(C)  =  class  C  <1  B  {  . . .  K  Ml . . .  M„> 
m  not  defined  in  Mi ...  M„ 

mtype{m,  C)  =  mtypeim,  B) 
mbodyim,  C)  =  mbodyim,  B) 

Valid  method  overriding: 


mtypeim,  B)  =  Ci . . .  C„->Co 
overrideim,  B,  Ci . . .  C„->Co) 


3  T  such  that  mtypeim,  B)  =  T 
overrideim,  B,  Ci . . .  C„->Co) 

Computation: 

fieldsiC)  =  Di  f  1 . . .  D„  in 
(new  C(ei . . .  e„))  .f - >  e,- 


(R-Field) 


mbodyim,  C)  =  (xi . . .  x„,  gq) 

(new  C(ei . . .  em))  .m(di . . .  d„)  - > 

[di/xi, . . . ,  d„/x„,  new  C(ei . . .  em)/this]  eo 


(R-Invk) 


C  <:  D 

(D)new  C(ei  . . .  e„)  - >  new  C(ei . . .  e„) 


(R-Cast) 


Method  typing: 

Xi  :  Di, . . . ,  x„  :  D„,  this  :CheGE  E<:D 
CT(C)  =  class  C  <  B  {. . .  } 

D  m(Di  xi . . .  D„  x„)  { *  e ;  F  ok  in  C 
Expression  typing: 


r  h  X  e  r(x)  (t-Var) 


r  h  e  G  C 

fieldsiC)  =  Di  f  1 . . , 

■D„  f„ 

r  h  e.f;  G  D; 

r  h  e  G  C 

mtypeim,  C)  =  Di . . 

.D„->D 

r  h  e;  G  C; 

CiCD;  (ViG{l.. 

.«}) 

The  .m(ei . . .  e„)  G  D 

fieldsiC)  = 

Di  f  1  . . . 

r  h  e;  G  C; 

i  Ci<:-Di  (V/6{1. 

..«}) 

r  h 

new  C(ei . . . e„)  G  C 

(T-Field) 


(T-Invk) 


(T-New) 


ri-eGD  D<:C 
r  h  (C)e  G  C 


(T-UCast) 


rheGD  C<:D  Cy^D 
r  h  (C)e  G  C 


(T-DCast) 


rheGD  CV:D  DV:C 
r  h  (C)e  G  C 


(T-SCast) 


Figure  10:  Semantics  of  Featherweight  Java  (reprinted  from  [16],  with  a  few  adaptations). 
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